-
-
Notifications
You must be signed in to change notification settings - Fork 11.3k
[P/D] NIXL Integration #17751
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[P/D] NIXL Integration #17751
Conversation
* [Update] LMcache connector v1 implementation Signed-off-by: ApostaC <yihua98@uchicago.edu> * [Add] examples for disaggregated prefill Signed-off-by: ApostaC <yihua98@uchicago.edu> * [add] extra information about evns Signed-off-by: ApostaC <yihua98@uchicago.edu> * Initial stubs for P/D scheduling changes Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> * Updates Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> * Rs branch (#3) * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * Rs branch (#5) Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * Remove Unneeded Arguments (#7) * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * stash Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * cleanup Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> --------- Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * Improve disagg-example.sh (#8) - fix spelling - CUDA_VISIBLE_DEVICES should be set externally Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * added connector Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * update Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * remove Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * seems to load properly Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * Revert "updated" This reverts commit 97316d9. * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * stash Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * added Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * diffs for local dev on macos Signed-off-by: Robert Shaw <rshaw@neuralmagic.com> * updated Signed-off-by: Robert Shaw <rshaw@neuralmagic.com> * update Signed-off-by: Robert Shaw <rshaw@neuralmagic.com> * updaed Signed-off-by: Robert Shaw <rshaw@neuralmagic.com> * updated Signed-off-by: Robert Shaw <rshaw@neuralmagic.com> * updated Signed-off-by: Robert Shaw <rshaw@neuralmagic.com> * Checkpoint. Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> * updated Signed-off-by: Robert Shaw <rshaw@neuralmagic.com> * Cleanup Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> * WIP Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> * updated Signed-off-by: Robert Shaw <rshaw@neuralmagic.com> * updated Signed-off-by: Robert Shaw <rshaw@neuralmagic.com> * updated on scheduler side Signed-off-by: Robert Shaw <rshaw@neuralmagic.com> * updated Signed-off-by: Robert Shaw <rshaw@neuralmagic.com> * updated Signed-off-by: Robert Shaw <rshaw@neuralmagic.com> * updated Signed-off-by: Robert Shaw <rshaw@neuralmagic.com> * updated Signed-off-by: Robert Shaw <rshaw@neuralmagic.com> * updated Signed-off-by: Robert Shaw <rshaw@neuralmagic.com> * updated Signed-off-by: Robert Shaw <rshaw@neuralmagic.com> * Hacking away Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> * cleanup Signed-off-by: Robert Shaw <rshaw@neuralmagic.com> * ensure request removed from running list Signed-off-by: Robert Shaw <rshaw@neuralmagic.com> * Runs E2E. Garbage output. Crashes on 2nd request Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> * update Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> * updated Signed-off-by: Robert Shaw <rshaw@neuralmagic.com> * updated Signed-off-by: Robert Shaw <rshaw@neuralmagic.com> * rename files Signed-off-by: Robert Shaw <rshaw@neuralmagic.com> * updated Signed-off-by: Robert Shaw <rshaw@neuralmagic.com> * updated Signed-off-by: Robert Shaw <rshaw@neuralmagic.com> * updated Signed-off-by: Robert Shaw <rshaw@neuralmagic.com> * updated Signed-off-by: Robert Shaw <rshaw@neuralmagic.com> * updated Signed-off-by: Robert Shaw <rshaw@neuralmagic.com> * update Signed-off-by: Robert Shaw <rshaw@neuralmagic.com> * Second request no longer crashes Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> * Remove gpu_model_runner hacks Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> * Clean up Justfile Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> * [Bugfix] Stale finished requests in EMPTY_MODEL_RUNNER_OUTPUT Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> * update Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> * justfile edits Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> * Update Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> * Fixes - lm_eval gsm8k has correctness Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> * "just delete the assert" Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> * fixup precommit issues Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> * Fixes Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> * updated (#12) Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * Add Accuracy Test (#13) * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> --------- Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * Preemption Bugfixes (#15) * stash fixed double free issue Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * fixed issue Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updatrd Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updatrd Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updatrd Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updatrd Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updatrd Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updatrd Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> --------- Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated (#16) Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * Fix Bad Merge | Fix Memory Leak in Upstream (#18) * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * fix merge Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> --------- Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * clean up justfile, examples Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> * more cleanup Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> * more cleanup Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> * more cleanup Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> * more cleanup Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> * More cleanup Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> * more cleanup Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> * more cleanup, precommit fixes Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> * More cleanup Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> * run_accuracy_test.sh UX Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> * squash warnings Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> * pre-commit Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> * update Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> * Add get_finished to base kv connector Signed-off-by: mgoin <mgoin64@gmail.com> * revert test.txt Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> * cleanup Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> * Cleanup Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> * review comments Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> --------- Signed-off-by: ApostaC <yihua98@uchicago.edu> Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> Signed-off-by: Robert Shaw <rshaw@neuralmagic.com> Signed-off-by: mgoin <mgoin64@gmail.com> Co-authored-by: ApostaC <yihua98@uchicago.edu> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com> Co-authored-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> Co-authored-by: Robert Shaw <rshaw@neuralmagic.com> Co-authored-by: mgoin <mgoin64@gmail.com> Co-authored-by: mgoin <mgoin64@gmail.com>
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
* updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * mypy Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * stash Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * update typing Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> --------- Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
* [V1] Support multiple kv connectors Signed-off-by: mgoin <mgoin64@gmail.com> Signed-off-by: Nick Hill <nhill@redhat.com> * Example script Signed-off-by: mgoin <mgoin64@gmail.com> * . Signed-off-by: mgoin <mgoin64@gmail.com> * Add test Signed-off-by: mgoin <mgoin64@gmail.com> * make mypy happy Signed-off-by: mgoin <mgoin64@gmail.com> * move MultiKVConnectorMetadata to multi_connector.py Signed-off-by: Nick Hill <nhill@redhat.com> * minor simplifications Signed-off-by: Nick Hill <nhill@redhat.com> * Remove script Signed-off-by: mgoin <mgoin64@gmail.com> * michael inprogress Signed-off-by: Nick Hill <nhill@redhat.com> * Make sure we pop requests from connector dict Signed-off-by: mgoin <mgoin64@gmail.com> * req_id -> request_id Signed-off-by: mgoin <mgoin64@gmail.com> --------- Signed-off-by: mgoin <mgoin64@gmail.com> Signed-off-by: Nick Hill <nhill@redhat.com> Co-authored-by: mgoin <mgoin64@gmail.com>
* Test xPyD Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> * backwards compatibility Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> * settable from env Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> --------- Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
To verify load precedence behavior Signed-off-by: Nick Hill <nhill@redhat.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>
* [Update] LMcache connector v1 implementation Signed-off-by: ApostaC <yihua98@uchicago.edu> * [Add] examples for disaggregated prefill Signed-off-by: ApostaC <yihua98@uchicago.edu> * [add] extra information about evns Signed-off-by: ApostaC <yihua98@uchicago.edu> * Initial stubs for P/D scheduling changes Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> * Updates Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> * Rs branch (#3) * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * Rs branch (#5) Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * Remove Unneeded Arguments (#7) * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * stash Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * cleanup Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> --------- Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * Improve disagg-example.sh (#8) - fix spelling - CUDA_VISIBLE_DEVICES should be set externally Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * added connector Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * update Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * remove Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * seems to load properly Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * Revert "updated" This reverts commit 97316d9. * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * stash Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * added Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * diffs for local dev on macos Signed-off-by: Robert Shaw <rshaw@neuralmagic.com> * updated Signed-off-by: Robert Shaw <rshaw@neuralmagic.com> * update Signed-off-by: Robert Shaw <rshaw@neuralmagic.com> * updaed Signed-off-by: Robert Shaw <rshaw@neuralmagic.com> * updated Signed-off-by: Robert Shaw <rshaw@neuralmagic.com> * updated Signed-off-by: Robert Shaw <rshaw@neuralmagic.com> * Checkpoint. Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> * updated Signed-off-by: Robert Shaw <rshaw@neuralmagic.com> * Cleanup Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> * WIP Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> * updated Signed-off-by: Robert Shaw <rshaw@neuralmagic.com> * updated Signed-off-by: Robert Shaw <rshaw@neuralmagic.com> * updated on scheduler side Signed-off-by: Robert Shaw <rshaw@neuralmagic.com> * updated Signed-off-by: Robert Shaw <rshaw@neuralmagic.com> * updated Signed-off-by: Robert Shaw <rshaw@neuralmagic.com> * updated Signed-off-by: Robert Shaw <rshaw@neuralmagic.com> * updated Signed-off-by: Robert Shaw <rshaw@neuralmagic.com> * updated Signed-off-by: Robert Shaw <rshaw@neuralmagic.com> * updated Signed-off-by: Robert Shaw <rshaw@neuralmagic.com> * Hacking away Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> * cleanup Signed-off-by: Robert Shaw <rshaw@neuralmagic.com> * ensure request removed from running list Signed-off-by: Robert Shaw <rshaw@neuralmagic.com> * Runs E2E. Garbage output. Crashes on 2nd request Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> * update Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> * updated Signed-off-by: Robert Shaw <rshaw@neuralmagic.com> * updated Signed-off-by: Robert Shaw <rshaw@neuralmagic.com> * rename files Signed-off-by: Robert Shaw <rshaw@neuralmagic.com> * updated Signed-off-by: Robert Shaw <rshaw@neuralmagic.com> * updated Signed-off-by: Robert Shaw <rshaw@neuralmagic.com> * updated Signed-off-by: Robert Shaw <rshaw@neuralmagic.com> * updated Signed-off-by: Robert Shaw <rshaw@neuralmagic.com> * updated Signed-off-by: Robert Shaw <rshaw@neuralmagic.com> * update Signed-off-by: Robert Shaw <rshaw@neuralmagic.com> * Second request no longer crashes Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> * Remove gpu_model_runner hacks Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> * Clean up Justfile Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> * [Bugfix] Stale finished requests in EMPTY_MODEL_RUNNER_OUTPUT Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> * update Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> * justfile edits Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> * Update Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> * Fixes - lm_eval gsm8k has correctness Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> * "just delete the assert" Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> * fixup precommit issues Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> * Fixes Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> * updated (#12) Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * Add Accuracy Test (#13) * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> --------- Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * Preemption Bugfixes (#15) * stash fixed double free issue Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * fixed issue Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updatrd Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updatrd Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updatrd Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updatrd Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updatrd Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updatrd Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> --------- Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated (#16) Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * Fix Bad Merge | Fix Memory Leak in Upstream (#18) * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * fix merge Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> --------- Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * cleanup code Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * cleanup code Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * stash Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updatted Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * revert Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * more spurious changes Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * Update vllm/distributed/kv_transfer/kv_connector/v1/nixl_connector.py Co-authored-by: Tyler Michael Smith <tysmith@redhat.com> * Update vllm/distributed/kv_transfer/kv_connector/v1/nixl_connector.py Co-authored-by: Tyler Michael Smith <tysmith@redhat.com> --------- Signed-off-by: ApostaC <yihua98@uchicago.edu> Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> Signed-off-by: Robert Shaw <rshaw@neuralmagic.com> Co-authored-by: ApostaC <yihua98@uchicago.edu> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com> Co-authored-by: Tyler Michael Smith <tysmith@redhat.com> Co-authored-by: Robert Shaw <rshaw@neuralmagic.com>
* updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * add test Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * add test Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> --------- Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
* [Update] LMcache connector v1 implementation Signed-off-by: ApostaC <yihua98@uchicago.edu> * [Add] examples for disaggregated prefill Signed-off-by: ApostaC <yihua98@uchicago.edu> * [add] extra information about evns Signed-off-by: ApostaC <yihua98@uchicago.edu> * Initial stubs for P/D scheduling changes Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> * Updates Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> * Rs branch (#3) * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * Rs branch (#5) Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * Remove Unneeded Arguments (#7) * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * stash Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * cleanup Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> --------- Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * Improve disagg-example.sh (#8) - fix spelling - CUDA_VISIBLE_DEVICES should be set externally Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * added connector Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * update Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * remove Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * seems to load properly Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * Revert "updated" This reverts commit 97316d9. * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * stash Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * added Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * diffs for local dev on macos Signed-off-by: Robert Shaw <rshaw@neuralmagic.com> * updated Signed-off-by: Robert Shaw <rshaw@neuralmagic.com> * update Signed-off-by: Robert Shaw <rshaw@neuralmagic.com> * updaed Signed-off-by: Robert Shaw <rshaw@neuralmagic.com> * updated Signed-off-by: Robert Shaw <rshaw@neuralmagic.com> * updated Signed-off-by: Robert Shaw <rshaw@neuralmagic.com> * Checkpoint. Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> * updated Signed-off-by: Robert Shaw <rshaw@neuralmagic.com> * Cleanup Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> * WIP Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> * updated Signed-off-by: Robert Shaw <rshaw@neuralmagic.com> * updated Signed-off-by: Robert Shaw <rshaw@neuralmagic.com> * updated on scheduler side Signed-off-by: Robert Shaw <rshaw@neuralmagic.com> * updated Signed-off-by: Robert Shaw <rshaw@neuralmagic.com> * updated Signed-off-by: Robert Shaw <rshaw@neuralmagic.com> * updated Signed-off-by: Robert Shaw <rshaw@neuralmagic.com> * updated Signed-off-by: Robert Shaw <rshaw@neuralmagic.com> * updated Signed-off-by: Robert Shaw <rshaw@neuralmagic.com> * updated Signed-off-by: Robert Shaw <rshaw@neuralmagic.com> * Hacking away Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> * cleanup Signed-off-by: Robert Shaw <rshaw@neuralmagic.com> * ensure request removed from running list Signed-off-by: Robert Shaw <rshaw@neuralmagic.com> * Runs E2E. Garbage output. Crashes on 2nd request Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> * update Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> * updated Signed-off-by: Robert Shaw <rshaw@neuralmagic.com> * updated Signed-off-by: Robert Shaw <rshaw@neuralmagic.com> * rename files Signed-off-by: Robert Shaw <rshaw@neuralmagic.com> * updated Signed-off-by: Robert Shaw <rshaw@neuralmagic.com> * updated Signed-off-by: Robert Shaw <rshaw@neuralmagic.com> * updated Signed-off-by: Robert Shaw <rshaw@neuralmagic.com> * updated Signed-off-by: Robert Shaw <rshaw@neuralmagic.com> * updated Signed-off-by: Robert Shaw <rshaw@neuralmagic.com> * update Signed-off-by: Robert Shaw <rshaw@neuralmagic.com> * Second request no longer crashes Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> * Remove gpu_model_runner hacks Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> * Clean up Justfile Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> * [Bugfix] Stale finished requests in EMPTY_MODEL_RUNNER_OUTPUT Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> * update Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> * justfile edits Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> * Update Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> * Fixes - lm_eval gsm8k has correctness Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> * "just delete the assert" Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> * fixup precommit issues Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> * Fixes Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> * updated (#12) Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * Add Accuracy Test (#13) * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> --------- Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * Preemption Bugfixes (#15) * stash fixed double free issue Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * fixed issue Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updatrd Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updatrd Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updatrd Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updatrd Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updatrd Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updatrd Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> --------- Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated (#16) Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * Fix Bad Merge | Fix Memory Leak in Upstream (#18) * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * fix merge Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> --------- Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * cleanup code Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * cleanup code Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * stash Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updatted Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * revert Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * more spurious changes Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * Support MLA in NIXL connector Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> * WIP adding tests Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> * wip Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> * Fixes Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> --------- Signed-off-by: ApostaC <yihua98@uchicago.edu> Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> Signed-off-by: Robert Shaw <rshaw@neuralmagic.com> Co-authored-by: ApostaC <yihua98@uchicago.edu> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com> Co-authored-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> Co-authored-by: Robert Shaw <rshaw@neuralmagic.com>
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
…ge_main [2025-5-5] Upstream sync
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
|
@robertgshaw2-redhat question: Does P instance block for handshake to happen? The proxy suggests that prefill does not need to know about the decode instance when the request is sent to it. Can you at high level explain the data and control flow of this architecture? |
| for block in self.single_type_manager.req_to_blocks[request_id] | ||
| ] | ||
|
|
||
| def get_num_blocks(self, request_id: str): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems that this function is not used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you're right, I will remove this.
Here's the doc: |
|
@robertgshaw2-redhat thanks, requested access. |
The KVConnector API have been updated after merging the integration of NIXL in this PR: vllm-project/vllm#17751 Update VUA's method signatures to match the new API. Signed-off-by: diabloneo <diabloneo@gmail.com>
Signed-off-by: ApostaC <yihua98@uchicago.edu> Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> Signed-off-by: Robert Shaw <rshaw@neuralmagic.com> Signed-off-by: mgoin <mgoin64@gmail.com> Signed-off-by: Nick Hill <nhill@redhat.com> Signed-off-by: Brent Salisbury <bsalisbu@redhat.com> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com> Co-authored-by: ApostaC <yihua98@uchicago.edu> Co-authored-by: Robert Shaw <rshaw@neuralmagic.com> Co-authored-by: mgoin <mgoin64@gmail.com> Co-authored-by: Nick Hill <nhill@redhat.com> Co-authored-by: Tyler Michael Smith <tysmith@redhat.com> Co-authored-by: Brent Salisbury <bsalisbu@redhat.com>
Signed-off-by: ApostaC <yihua98@uchicago.edu> Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> Signed-off-by: Robert Shaw <rshaw@neuralmagic.com> Signed-off-by: mgoin <mgoin64@gmail.com> Signed-off-by: Nick Hill <nhill@redhat.com> Signed-off-by: Brent Salisbury <bsalisbu@redhat.com> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com> Co-authored-by: ApostaC <yihua98@uchicago.edu> Co-authored-by: Robert Shaw <rshaw@neuralmagic.com> Co-authored-by: mgoin <mgoin64@gmail.com> Co-authored-by: Nick Hill <nhill@redhat.com> Co-authored-by: Tyler Michael Smith <tysmith@redhat.com> Co-authored-by: Brent Salisbury <bsalisbu@redhat.com> Signed-off-by: Yuqi Zhang <yuqizhang@google.com>
|
@robertgshaw2-redhat Can you open access to this document https://docs.google.com/document/d/1lDQt6hUXBoMnMabsuAMZAWdqit7TzuflsFGHLt2gZus/edit?tab=t.0 |
@robertgshaw2-redhat Hi,Can you grant me access? requested access. |
|
hello, and execuse me. When I was learning NIXL intergration, I have a question about this pr. How does WAITING_FOR_REMOTE_KVS work in PD Disaggregation and why is this different from other solution (like simple connector). Looking forward to your reply. Thank you very much. |
Hey, can you give me a request access? |
1 similar comment
Hey, can you give me a request access? |
Hi, can you give me a request access? |
|
Hi @robertgshaw2-redhat , would you mind grant my request to access the design doc? thanks so much! |
SUMMARY:
FOLLOW UPS: